The IBM BOLT speech transcription system

نویسندگان

  • Samuel Thomas
  • George Saon
  • Hong-Kwang Jeff Kuo
  • Lidia Mangu
چکیده

We describe the IBM automatic speech recognition (ASR) system for the DARPA Broad Operational Language Translation (BOLT) program. The system is used to transcribe conversational telephone speech (CTS) prior to machine translation for Phase 3 of the program’s Activity A. The ASR system is a combination of novel sequence trained ensemble deep neural network acoustic models on speaker adapted features and convolutional neural network models on two kinds of spectro-temporal representations of speech, in conjunction with a variety of class, neural network and n-gram based language models. Acoustic and language models for the recognition system are built on transcribed audio released under the program and further optimized for the final machine translation task as well. The evaluation system has a word error rate of 32.7% on a 2 hour Egyptian Arabic development set for this task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The IBM speech activity detection system for the DARPA RATS program

We present the IBM speech activity detection system that was fielded in the phase 2 evaluation of the DARPA RATS (robust automatic transcription of speech) program. Key ingredients of the system are: multi-pass HMM Viterbi segmentation, fusion of multiple feature streams, file-based and speech-based normalization schemes, the use of regular and convolutional deep neural networks, and model fusi...

متن کامل

Transcription of broadcast news-system robustness issues and adaptation techniques

This paper describes some of the main problems and issues speci c to the transcription of broadcast news and describes some of the methods for solving them that have been incorporated into the IBM Large Vocabulary Continuous Speech Recognition System

متن کامل

The IBM RT06s Evaluation System for Speech Activity Detection in CHIL Seminars

In this paper, we describe the IBM system submitted to the NIST Rich Transcription Spring 2006 (RT06s) evaluation campaign for automatic speech activity detection (SAD). This SAD system has been developed and evaluated on CHIL lecture meeting data using far-field microphone sensors, namely a single distant microphone (SDM) configuration and a multiple distant microphone (MDM) condition. The IBM...

متن کامل

The cambridge university 2014 BOLT conversational telephone Mandarin Chinese LVCSR system for speech translation

This paper presents the development of the 2014 Cambridge University conversational telephone Mandarin Chinese LVCSR system for the DARPA BOLT speech translation evaluation. A range of advanced modelling techniques were employed to both improve the recognition performance and provide a suitable integration with the translation system. These include an improved system combination technique using...

متن کامل

Current status of the IBM Trainable Speech Synthesis System

This paper describes the current status of the IBM Trainable Speech Synthesis System. The system is a state-of-the-art, trainable, unit-selection based concatenative speech synthesiser. The system uses hidden Markov models (HMMs) to provide a phonetic transcription and HMM state alignment of a database of single-speaker continuous-speech training data. The runtime synthesiser uses the HMM state...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015